Relationship between the intercept and slope

Even in a simple linear regression there is a Variance-Covariance matrix for the two parameters of the model: the intercept and the slope. The model here is:

\[ y_i = \beta_0 + \beta_1 x_i + e_i \]

It is natural for these parameters to be correlated becuase it is les likey to have extreme combinations for both of them. Here is a simple non-Bayesian example:

## Simulate data
xx <- 1:20
yy <- 3 + 1.5 * xx + rnorm(length(xx))
dat <- data.frame(x = xx, y = yy)
## Visualize
ggplot(data = dat, aes(x = x, y = y)) + 
  geom_point()

## Fit a linear model
fit <- lm(y ~ x, data = dat)
## Beta coefficients
coef(fit)
## (Intercept)           x 
##    2.763020    1.526467
## Variance-Covariance for beta
vcov(fit)
##             (Intercept)            x
## (Intercept)  0.24087609 -0.017625080
## x           -0.01762508  0.001678579
## Correlation matrix
cov2cor(vcov(fit))
##             (Intercept)          x
## (Intercept)   1.0000000 -0.8765231
## x            -0.8765231  1.0000000
## Visualize model fits
set.seed(1234)
fit.s <- simulate_lm(fit, nsim = 5, value = "data.frame")
ggplot(data = fit.s, aes(x = x, y = y)) + 
  geom_line(aes(y = sim.y, group = ii), color = "purple", alpha = 0.3) + 
  geom_point() 

## We can also visualize the negative correlation among the parameters
fit.b <- boot_lm(fit)
hist(fit.b, 1, ci = "perc")

hist(fit.b, 2, ci = "perc")

## Visualizing the correlation
plot(fit.b$t, xlab = "Intercept", ylab = "Slope")

The function simulate_lm uses internally the function MASS::mvrnorm which can simualte samples from a multivariate normal distribution. boot_lm does bootstrapping which involves simulating data similar to the observed one and re-fitting the model (by default 1000 times). This is a way of empirically deriving the distribution of the parameters.

The brms way

How to fit a simple Bayesian linear regression using brms

priors <- prior(normal(0, 5), class = "Intercept") + 
  prior(normal(0, 4), coef = "x")
## Fit model
fbrms <- brm(y ~ x, data = dat, prior = priors, refresh = 0)
## Compiling the C++ model
## Start sampling
## Plotting posterior distributions of parameters
plot(fbrms, "b_")

## Extract posterior samples
ps <- posterior_samples(fbrms)
## Simple plot
plot(ps[,1:2], xlab = "Intercept", ylab = "slope")

## Or using pairs
pairs(fbrms, "^b_")

## We can extract equivalent quantities
vcov(fbrms)
##             Intercept            x
## Intercept  0.32344397 -0.023751584
## x         -0.02375158  0.002240561
cov2cor(vcov(fbrms))
##            Intercept          x
## Intercept  1.0000000 -0.8822967
## x         -0.8822967  1.0000000

How do we fit a model with varying intercepts and slopes in brms

data(barley, package = "nlraa")
## Fit one regression for each year, with year as random
barley$NF2 <- barley$NF^2
br.lme <- lme(yield ~ NF + NF2, random = ~ NF + NF2 | year, data = barley)
## Fixed or population-level effects
fixef(br.lme)
## (Intercept)          NF         NF2 
##  133.422688   35.440069   -1.391495
## Variance-covariance for beta
vcov(br.lme)
##               (Intercept)         NF          NF2
## (Intercept) 165.214652036 -3.0824699 -0.007421724
## NF           -3.082469869  5.7281654 -0.265883674
## NF2          -0.007421724 -0.2658837  0.019519108
cov2cor(vcov(br.lme))
##              (Intercept)         NF          NF2
## (Intercept)  1.000000000 -0.1001998 -0.004132859
## NF          -0.100199784  1.0000000 -0.795158888
## NF2         -0.004132859 -0.7951589  1.000000000
## Bayesian way
prs <- prior(normal(100, 50), class = Intercept) + 
  prior(normal(50, 15), class = "b", coef = "NF") + 
  prior(normal(0, 5), class = "b", coef = "NF2") + 
  prior(student_t(2, 0, 150), class = "sd")
## Fit the multilevel model
br.brms <- brm(yield ~ NF + NF2 + (NF + NF2 | year), 
               data = barley, refresh = 0, iter = 10000,
               cores = 4,
               seed = 1234,
               prior = prs, control = list(adapt_delta = 0.95,
                                           max_treedepth = 15))
## Compiling the C++ model
## Start sampling
## Warning: There were 2 divergent transitions after warmup. Increasing adapt_delta above 0.95 may help. See
## http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
## Warning: Examine the pairs() plot to diagnose sampling problems
## I get some complains, but the model fit seems fine
plot(br.brms, "^b_")

pairs(br.brms, "^b_")

pp <- predict(br.brms)  
pp2 <- cbind(barley, pp)
## Plot
ggplot(data = pp2, aes(NF, yield)) + 
  geom_point() + 
  facet_wrap(~year) + 
  geom_line(aes(y = Estimate)) + 
  geom_ribbon(aes(ymin = Q2.5, ymax = Q97.5), fill = "purple", alpha = 0.3)

More on Covariances

Covariances arise any time we have two or more random variables. For a sinlge variable we have a mean and a variance.

\[ \mu = E(X) \\ \sigma^2 = E[(X - E(X))^2] \] When we have two variables \(X_1\) and \(X_2\) then we have

\[ Cov(X_1, X_2) = \left[ \begin{array}{ll} \sigma_1^2 & \sigma_{12} \\ \sigma_{21} & \sigma_2^2 \end{array} \right] \] In the simplest of linear regression models we have

\[ Cov(Y) = Cov(e) = \sigma^2 \times I_{n \times n} \]

However, this Covariance could be modelled in much more complex ways when the data are more complex

library(nlme)
library(nlraa)
library(ggplot2)
data(ChickWeight)

dim(ChickWeight)
## [1] 578   4
fm1 <- lm(weight ~ Time, data = ChickWeight)

ggplot(data = ChickWeight, aes(Time, weight)) + 
  geom_point() + 
  ggtitle("Chick weight with time")

ggplot(data = ChickWeight, aes(Time, weight, color = Chick)) + 
  geom_point() + 
  geom_line(aes(y = fitted(fm1)), color = "black") + 
  theme(legend.position = "none")

## Fit a linear model where we fit 
## increasing variance and CAR1 correlation structure
fit4 <- gls(weight ~ Time, data = ChickWeight, 
            weights = varPower(),
            correlation = corCAR1(form = ~ Time | Chick))

v4 <- var_cov(fit4)
## Tip: you can visualize these matrices using
image(log(v4[,ncol(v4):1]))

## Take a closer look
v42 <- v4[1:36, 1:36]
image(log(v42[,ncol(v42):1]))

## Look at raw numbers
round(v4[1:12, 1:12])
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
##  [1,]   89  129  178  236  302  377  462  555  658   770   890   954
##  [2,]  129  194  267  353  452  565  692  832  985  1152  1333  1428
##  [3,]  178  267  379  501  642  803  982 1181 1400  1637  1893  2029
##  [4,]  236  353  501  684  877 1096 1341 1613 1911  2235  2585  2769
##  [5,]  302  452  642  877 1160 1449 1774 2133 2527  2956  3418  3663
##  [6,]  377  565  803 1096 1449 1869 2287 2750 3259  3811  4408  4723
##  [7,]  462  692  982 1341 1774 2287 2888 3473 4115  4813  5566  5964
##  [8,]  555  832 1181 1613 2133 2750 3473 4310 5106  5972  6907  7400
##  [9,]  658  985 1400 1911 2527 3259 4115 5106 6241  7300  8443  9046
## [10,]  770 1152 1637 2235 2956 3811 4813 5972 7300  8809 10188 10916
## [11,]  890 1333 1893 2585 3418 4408 5566 6907 8443 10188 12158 13026
## [12,]  954 1428 2029 2769 3663 4723 5964 7400 9046 10916 13026 14177
## Look at ChickWeight
head(ChickWeight, 12)
## Grouped Data: weight ~ Time | Chick
##    weight Time Chick Diet
## 1      42    0     1    1
## 2      51    2     1    1
## 3      59    4     1    1
## 4      64    6     1    1
## 5      76    8     1    1
## 6      93   10     1    1
## 7     106   12     1    1
## 8     125   14     1    1
## 9     149   16     1    1
## 10    171   18     1    1
## 11    199   20     1    1
## 12    205   21     1    1
## Look at the correlation matrix
round(cov2cor(v4[1:12, 1:12]), 2)
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
##  [1,] 1.00 0.98 0.97 0.95 0.94 0.92 0.91 0.90 0.88  0.87  0.86  0.85
##  [2,] 0.98 1.00 0.98 0.97 0.95 0.94 0.92 0.91 0.90  0.88  0.87  0.86
##  [3,] 0.97 0.98 1.00 0.98 0.97 0.95 0.94 0.92 0.91  0.90  0.88  0.88
##  [4,] 0.95 0.97 0.98 1.00 0.98 0.97 0.95 0.94 0.92  0.91  0.90  0.89
##  [5,] 0.94 0.95 0.97 0.98 1.00 0.98 0.97 0.95 0.94  0.92  0.91  0.90
##  [6,] 0.92 0.94 0.95 0.97 0.98 1.00 0.98 0.97 0.95  0.94  0.92  0.92
##  [7,] 0.91 0.92 0.94 0.95 0.97 0.98 1.00 0.98 0.97  0.95  0.94  0.93
##  [8,] 0.90 0.91 0.92 0.94 0.95 0.97 0.98 1.00 0.98  0.97  0.95  0.95
##  [9,] 0.88 0.90 0.91 0.92 0.94 0.95 0.97 0.98 1.00  0.98  0.97  0.96
## [10,] 0.87 0.88 0.90 0.91 0.92 0.94 0.95 0.97 0.98  1.00  0.98  0.98
## [11,] 0.86 0.87 0.88 0.90 0.91 0.92 0.94 0.95 0.97  0.98  1.00  0.99
## [12,] 0.85 0.86 0.88 0.89 0.90 0.92 0.93 0.95 0.96  0.98  0.99  1.00

Some magic happened in when we fitted the model using gls. The varPower option assumes that the variance in the diagonal can be modeled in the following way:

\[ \sigma^2(v) = |v|^{2 t} \]

fit4
## Generalized least squares fit by REML
##   Model: weight ~ Time 
##   Data: ChickWeight 
##   Log-restricted-likelihood: -2037.874
## 
## Coefficients:
## (Intercept)        Time 
##   33.874048    2.866917 
## 
## Correlation Structure: Continuous AR(1)
##  Formula: ~Time | Chick 
##  Parameter estimate(s):
##       Phi 
## 0.9922111 
## Variance function:
##  Structure: Power of variance covariate
##  Formula: ~fitted(.) 
##  Parameter estimates:
##    power 
## 2.481534 
## Degrees of freedom: 578 total; 576 residual
## Residual standard error: 0.001508325

The correlation was modeled using corCAR1 (https://en.wikipedia.org/wiki/Autoregressive_model).

SAS-cov

SAS-cov